智能论文笔记

The bi-objective multimodal car-sharing problem

Miriam Enzi , Sophie N. Parragh , Jakob Puchinger

分类：人工智能

2020-10-18

双目标多模式共享问题（BIO-MMCP）的目的是确定旅行的最佳运输分配方式，并安排可用汽车和用户的路线，同时最大程度地减少成本并最大程度地提高用户满意度。我们从以用户为中心的角度研究了生物MMCP。由于用户满意度是共享移动性系统中的关键方面，因此我们在第二个目标中考虑用户偏好。用户可以在一天中的不同时间选择并对其首选的运输方式进行排名。通过这种方式，我们可以解释整个计划范围内的不同交通状况。我们研究问题的不同变体。在基本问题中，用户必须实现的任务顺序是预先固定的，旅行时间以及偏好在计划范围上是恒定的。在变体2中，引入了与时间有关的旅行时间和偏好。在变体3中，我们在允许其他路由决策时检查了挑战。变体4集成了变体2和3。在最后一个变体中，我们开发了一种分支和切割算法，该算法嵌入了两个双向目标框架中，即$ \ epsilon $ -constraint方法和一种加权二进制搜索方法。计算实验表明，分支和切割算法的表现优于MIP公式，我们讨论了沿Pareto边境的更改解决方案。

translated by 谷歌翻译

ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports

Katharina Jeblick , Balthasar Schachtner , Jakob Dexl , Andreas Mittermeier , Anna Theresa Stüber , Johanna Topalis , Tobias Weber , Philipp Wesp , Bastian Sabel , Jens Ricke

分类：自然语言处理 | 机器学习

2022-12-30

The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.

translated by 谷歌翻译

High-fidelity Direct Contrast Synthesis from Magnetic Resonance Fingerprinting

Ke Wang , Mariya Doneva , Jakob Meineke , Thomas Amthor , Ekin Karasan , Fei Tan , Jonathan I. Tamir , Stella X. Yu , Michael Lustig

分类：计算机视觉

2022-12-21

Magnetic Resonance Fingerprinting (MRF) is an efficient quantitative MRI technique that can extract important tissue and system parameters such as T1, T2, B0, and B1 from a single scan. This property also makes it attractive for retrospectively synthesizing contrast-weighted images. In general, contrast-weighted images like T1-weighted, T2-weighted, etc., can be synthesized directly from parameter maps through spin-dynamics simulation (i.e., Bloch or Extended Phase Graph models). However, these approaches often exhibit artifacts due to imperfections in the mapping, the sequence modeling, and the data acquisition. Here we propose a supervised learning-based method that directly synthesizes contrast-weighted images from the MRF data without going through the quantitative mapping and spin-dynamics simulation. To implement our direct contrast synthesis (DCS) method, we deploy a conditional Generative Adversarial Network (GAN) framework and propose a multi-branch U-Net as the generator. The input MRF data are used to directly synthesize T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) images through supervised training on paired MRF and target spin echo-based contrast-weighted scans. In-vivo experiments demonstrate excellent image quality compared to simulation-based contrast synthesis and previous DCS methods, both visually as well as by quantitative metrics. We also demonstrate cases where our trained model is able to mitigate in-flow and spiral off-resonance artifacts that are typically seen in MRF reconstructions and thus more faithfully represent conventional spin echo-based contrast-weighted images.

translated by 谷歌翻译

Medical Diagnosis with Large Scale Multimodal Transformers -- Leveraging Diverse Data for More Accurate Diagnosis

Firas Khader , Gustav Mueller-Franzes , Tianci Wang , Tianyu Han , Soroosh Tayebi Arasteh , Christoph Haarburger , Johannes Stegmaier , Keno Bressem , Christiane Kuhl , Sven Nebelung

分类：机器学习 | 人工智能

2022-12-18

Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.

translated by 谷歌翻译

DUIDD: Deep-Unfolded Interleaved Detection and Decoding for MIMO Wireless Systems

Reinhard Wiesmayr , Chris Dick , Jakob Hoydis , Christoph Studer

分类：机器学习

2022-12-15

Iterative detection and decoding (IDD) is known to achieve near-capacity performance in multi-antenna wireless systems. We propose deep-unfolded interleaved detection and decoding (DUIDD), a new paradigm that reduces the complexity of IDD while achieving even lower error rates. DUIDD interleaves the inner stages of the data detector and channel decoder, which expedites convergence and reduces complexity. Furthermore, DUIDD applies deep unfolding to automatically optimize algorithmic hyperparameters, soft-information exchange, message damping, and state forwarding. We demonstrate the efficacy of DUIDD using NVIDIA's Sionna link-level simulator in a 5G-near multi-user MIMO-OFDM wireless system with a novel low-complexity soft-input soft-output data detector, an optimized low-density parity-check decoder, and channel vectors from a commercial ray-tracer. Our results show that DUIDD outperforms classical IDD both in terms of block error rate and computational complexity.

translated by 谷歌翻译

Diffusion Probabilistic Models beat GANs on Medical Images

Gustav Müller-Franzes , Jan Moritz Niehues , Firas Khader , Soroosh Tayebi Arasteh , Christoph Haarburger , Christiane Kuhl , Tianci Wang , Tianyu Han , Sven Nebelung , Jakob Nikolas Kather

分类：计算机视觉

2022-12-14

The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we propose Medfusion, a conditional latent DDPM for medical images. We compare our DDPM-based model against GAN-based models, which constitute the current state-of-the-art in the medical domain. Medfusion was trained and compared with (i) StyleGan-3 on n=101,442 images from the AIROGS challenge dataset to generate fundoscopies with and without glaucoma, (ii) ProGAN on n=191,027 from the CheXpert dataset to generate radiographs with and without cardiomegaly and (iii) wGAN on n=19,557 images from the CRCMS dataset to generate histopathological images with and without microsatellite stability. In the AIROGS, CRMCS, and CheXpert datasets, Medfusion achieved lower (=better) FID than the GANs (11.63 versus 20.43, 30.03 versus 49.26, and 17.28 versus 84.31). Also, fidelity (precision) and diversity (recall) were higher (=better) for Medfusion in all three datasets. Our study shows that DDPM are a superior alternative to GANs for image synthesis in the medical domain.

translated by 谷歌翻译

SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

Benjamin Ellis , Skander Moalla , Mikayel Samvelyan , Mingfei Sun , Anuj Mahajan , Jakob N. Foerster , Shimon Whiteson

分类：机器学习

2022-12-14

The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this work, we conduct new analysis demonstrating that SMAC is not sufficiently stochastic to require complex closed-loop policies. In particular, we show that an open-loop policy conditioned only on the timestep can achieve non-trivial win rates for many SMAC scenarios. To address this limitation, we introduce SMACv2, a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation. We show that these changes ensure the benchmark requires the use of closed-loop policies. We evaluate state-of-the-art algorithms on SMACv2 and show that it presents significant challenges not present in the original benchmark. Our analysis illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can help benchmark the next generation of MARL methods. Videos of training are available at https://sites.google.com/view/smacv2

translated by 谷歌翻译

Image Compression with Product Quantized Masked Image Modeling

Alaaeldin El-Nouby , Matthew J. Muckley , Karen Ullrich , Ivan Laptev , Jakob Verbeek , Hervé Jégou

分类：计算机视觉

2022-12-14

Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.

translated by 谷歌翻译

Co-training $2^L$ Submodels for Visual Recognition

Hugo Touvron , Matthieu Cord , Maxime Oquab , Piotr Bojanowski , Jakob Verbeek , Hervé Jégou

分类：计算机视觉

2022-12-09

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

translated by 谷歌翻译

Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

Björn Plüster , Jakob Ambsdorf , Lukas Braach , Jae Hee Lee , Stefan Wermter

分类：计算机视觉 | 自然语言处理

2022-12-08

Natural language explanations promise to offer intuitively understandable explanations of a neural network's decision process in complex vision-language tasks, as pursued in recent VL-NLE models. While current models offer impressive performance on task accuracy and explanation plausibility, they suffer from a range of issues: Some models feature a modular design where the explanation generation module is poorly integrated with a separate module for task-answer prediction, employ backbone models trained on limited sets of tasks, or incorporate ad hoc solutions to increase performance on single datasets. We propose to evade these limitations by applying recent advances in large-scale multi-task pretraining of generative Transformer models to the problem of VL-NLE tasks. Our approach outperforms recent models by a large margin, with human annotators preferring the generated explanations over the ground truth in two out of three evaluated datasets. As a novel challenge in VL-NLE research, we propose the problem of multi-task VL-NLE and show that jointly training on multiple tasks can increase the explanation quality. We discuss the ethical implications of high-quality NLE generation and other issues in recent VL-NLE research.

translated by 谷歌翻译